engine: add support for finding a plan's affected shards#8681
engine: add support for finding a plan's affected shards#8681vmg wants to merge 1 commit intovitessio:mainfrom
Conversation
Signed-off-by: Vicent Marti <vmg@strn.cat>
|
This would have not worked for not just computation work but also, there are a lot of interdependent primitives that needs results from input primitive to determine which shard the current primitive should send the query to. Also, there are cases like auto_increment sequences which will give a value that says the query will go to a particular shard but that value is lost when the query is actually executed as it will try to get the next sequence value. The other approach we talked about was that buffering still happens at tabletgateway level and the buffering event will provide the information as to execute the query one the event is complete (in case of reparenting) or re-execute the plan (in case of resharding). This also has one issue when it touches lookup vindexes for an insert or update. Example now, a select query will fail if name_id_map is an unique lookup vindex as it tries to map to multiple shards for the input. |
|
The best way, I see is that instead of re-execution of the complete plan, we have to put retries inside the engine primitives so that it execute those specific failed queries which the tabletgateway will ask to retry with new shard destination. |
Description
I just had a very productive unproductive morning! After discussing the next steps for our buffering work with @harshit-gangal, he suggested that I implement a
GetExecShardsmethod on all our planPrimitivetypes: this would allow us to figure out all the shards that would be affected by executing a given plan, so the new buffering code knows whether a given plan would require buffering because a given shard is currently undergoing a disruption event.After implementing the whole logic for all our primitive types, I quickly noticed that this is not a feasible step forward: for many of the most common primitives, figuring out the reachable shards for the plan is essentially as expensive as actually executing the plan itself (in fact, for some primitives, we need to actually execute parts of the plan in order to accurately gather the reachable shards for the full execution).
I'm opening this PR nonetheless for further discussion, and as a future reference. Although the functionality is complete and working, I'm not sure there's a compelling reason to have it merged.
cc @harshit-gangal @deepthi
Related Issue(s)
Checklist
Deployment Notes